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ABSTRACT 

In this paper we present a method, based on linear interpolation, for 
detecting and correcting bad data points in a set of data without 
contaminating the good data points. We are not concerned with the 
small random errors usually attributed to a noisy system and assume 
that the data points which are in error are relatively isolated from 
each other and that the number of such points is small compared to the 
total number of data points 
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Data Smoothing and Error Detection 
Based on Linear Interpolation 

V. m. Guerra* and R. A. Tapia* 

1. Introduction . In the handling of large sets of data 
it is not uncommon to inadvertently introduce errors into 
the data. Typical causes for the introduction of error 
might be : 

(a) Reading error; 

(b) Keypunch error; 

(c) Machine malfunction. 

In this paper we consider the problem of detecting and removing 
these errors without contaminating the good data. We are 
not concerned with the small random errors usually attributed 
to a noisy system. Therefore it seems reasonable to ex- 
pect that the data points which are in error are relatively 
isolated from each other and that the number of such points 
is small compared to the total number of data points; 
however the errors themselves will probably be quite large. 

This latter consideration alone forces us to reject the 
well-known averaging techniques for data smoothing [ 3l ; since 
the bad data would significantly effect the good data. 

* Department of Mathematical Sciences, Rice University, Houston, 
Texas 77001. This work was sponsored by NASA-MSC under 
contract NAS 9-12776. 





If we consider removing the errors by smoothing the 
data using splines and least squares (see [4] , {6], [7] 
and [8] ), then it is well-known that the norm (least 

squares) is sensitive to outliers (hence, again our bad 
points would influence our good points). This observation 
immediately suggests the use of splines and the 1^ norm via 
linear programming with differential inequality constraints 
(see [ 2 ] and [ 5 ] ). Our main reason for rejecting both 
and (as well as L^) approaches is both obvious and ex- 
tremely realistic. Namely, for large data sets, such as 
the remote sensing data presently being analyzed at NASA 
Manned Spacecraft Center, the use of the or approach 
would require a prohibitive amount of computer time and 
computer storage and would undoubtedly lead to extreme 
numerical instabilities. The amount of work required to 
implement these two approaches is of the order of n where 
n is the number of data points. The approach we are about 
to describe is of order n (i.e. the work increases linearly 
with the data). Moreover, while we acknowledge the fact 
that both the 1^ and approaches would probably give sat- 
isfactory results for small data sets we feel our approach 
will do as well. 

In this paper we consider only the one-dimensional problem. 
In subsequent papers we will extend our approach to higher 
dimensions and also consider using methods of interpolation 
more sophisticated than linear interpolation. 



2. The Linear Smoothing Algorithm . Consider a set of points 
in the plane with equally spaced abscissas, say 

A = ( (x i ,y i ) : i = l , . . . ,m ). 

Definition 1 » By an anchor point of the set A we mean a 
point which is assumed to be correct and is not to be 
smoothed . 

Remark . We shall assume that (x.,y,) and (x ,y ) are 
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always anchor points of the set A. 

P e finition 2 . By the point energy of the non-anchor point 
(x i ,y i )€ A we mean the (ordinate) distance from the line 

passing through the points < x 1 . 1 »y 1 .- 1 ') and (x i+i»y i+1 > 

to the point (x^y^. If ( x ± » y ± ) is an anchor point, 

then its point energy is zero. 

Definition 3 . By the total energy of the set A we mean the sum 
of the point energies of all points in A(i.e., the L^norm of 
the point energies). 

Definition 4 . By the smoothness of the set A we mean the 
largest point energy (i.e., the L^-norm of the point energies) 

Moreover, we say that A is e-smooth if the smoothness of A is 
less than or equal to e. 

Proposition 1 . The following are equivalent: 

(a) The set A is 0-smooth; 

(b) The set A has zero total energy; 



(c) The set A lies on the piecewise- linear function 
which interpolates the anchor points of A. 

Proof . The proof is straightforward. 

Definition 4 . By the normalized second difference at the 
point (x 1> y i ) € A we mean 

r i = ^ y i+l " y i + 2 y i-l’ 1 = 2 m_1 ’ 

Proposition 2 . If a ± denotes the point energy of the non- 
anchor point (x , yi ) P A, then 



Proof . The proof is not difficult. 

Definition 5 . By the linear smoothing approach we mean 
the transformation of the set A into an e-smooth set by 
successive changes of the values of the ordinates c£ the 
points with the largest point energies. Specifically, if 
t£k^n is such that a k =max(a i :Kiin) (if more than one such 

k exists then we choose the one of smallest index), then 
we change the point ( x k >y k ) to the P oint < x k’ y k +e k r k^ 

for some -^0^1 and repeat the procedure until (hopefully) 
the transformed set is e-smooth (for some given e£0) . 
Remark . If ® k = 0> then the data is not modified. If 0^ = 1, 

then we are moving the point ( x k ,y k^ onto the line inter- 
polating its two neighbors; hence by requiring 140^1 we 

have guaranteed that the point energy at the k-th point 


will decrease at least by a factor of jf. 

Remark . For simplicity we may choose 6 k always equal to 

1 3 . 

a constant, e.g., y, or 1 • ' 

3. Convergence of the Linear Smoothing Algorithm . To dis- 
tinguish between the values of the point energies and other 
quantities at different iterations a subscript, or a second 
subscript (whatever the case may be) will be added whenever 
necessary. For example A n will denote the set A at the n- th 
iteration of the linear smoothing process. We also let 
Aq denote A. 

Proposition 3 . If E n denotes the total energy of then 

(e ) is a monotone nonincreasing sequence. Moreover 
n 

E < e - -10 (k denotes the index of point in A which 

n+1' n 2 k k , n 

is to be modified) if either the (k-l)-th or (k+l)-th point is 
an anchor point. Finally we have E n+1 = E n if and only if 

_ r and r are of the same sign and the (k-l)-th 

k-l,n’ k , n k+l,n 

and (k+1 ) - th points are not anchor points. 

Proof . All the point energies except possibly « k _ x » <* k and 

cy are the same at the n- th and (n + l)-th iteration. 
k+1 

Mo re ove r 

v = y , + 9 , r, _ » 

y k,n+l k,n k k,n 

hence 

= r + —9 r 

k+l,n+l k+1 , n ? k k,n 

k” 1 , n+1 


( 1 ) 


r i i ' + B r , 

k-l,’n ' k k,n 


b 


r k , n+1 - (1 'V r k,n' 


Now since 0^-6, srl we have 
k 

K.n-J - < X -VK,J- 

v 

Therefore taking absolute values, using the triangle 
inequality and adding in (1) we have that E E^. 

Clearly if the (k-l)-th or the (k+l)-th point is an 
anchor point we must have a decrease in the total energy 
of at least f e k i r k n l* Again from (1) we will have a de- 
crease if either r. , or r, , . _ has a different sign 

k- 1 , n k+1 , n 

than r. . This proves the proposition, 
k , n 

Remark . Although the energy of A n + 1 may be equal to the 
energy of A (i.e., no decrease) it may happen that A^ +1 

is significantly smoother than A^. However a simple example 
can be constructed to show that the smoo thne s s ( in contrast to 
the energy) is not monotone non i nc r e a s ing ; hence for certain 
purposes the natural criterion (norm) to use is the energy. 

Proposition 4 . If the total energy of the set A^ is not 
zero, then the maximum number of iterations that can occur 
without decreasing this energy is bounded above by 


B = 2 1 " 

(where m is the number of data points). 

Proof . We will first show that if the energy is not decreased, 
then we can only modify a particular point twice before moving 
on to another point. Suppose we operate twice on the point 



^ X k’^k^* resu ^- t the first iteration is given by (1) 

and the result of the second is easily seen to be 


k+1 , n+2 

r k+l , n 

+ <*\ 


k- 1 , n+2 

r k-l,n 

+ < i e k 

+ i 6 k (1 - e k> 2 > r k,» 

k , n+2 

d-e k ) 2 

r 1 

k , n 



Now since the energy did not decrease we must have by 

Proposition 3, that r, . , r. and r, are all of 

k- 1 , n k , n k+1 , n 

the same sign. Also, since i^9^1 we have 

ie k *(i-e k ) 2 ; 

this shows that I r k+1 ^ n+2 ! > I r k n+2 I • It follows that 
(x k , y fc n+ 2 ^ will not be modified on the subsequent 

iteration. It is not difficult to show that we will move 

one point in at most 2 iterations, 2 points in at most 2+2 2 

K 

iterations and in general K points in at most £ 2 1 iterations. 

i = l 

This proves the proposition. 

Remark , The bound given in the previous proposition is far 
from being sharp. It merely demonstrates an important fact 
which will allow us to prove convergence. 

Proposition 5 . The sequence giving the total energy 

at each iteration of the linear smoothing algorithm converges 
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to zero. 

Proof . From Proposition 3 {E^) is a monotone nonincreasing 

sequence which is bounded below by zero; therefore it converges. 
Suppose E^O . First note that for each n=l,2,... 

E 

there exists an integer l-j(n)^m such that m* 


To see this suppose a. < — for lfri-^m. 
r r i , n m 


m 


Then E = £ a . <m — = E, which contradicts Proposition 3 

n ^ ^ l , n m 


By Proposition 3 and 4 for some integer n«J(n)in+2 m we have that 


E ^_E. - i0. or. . 

J (n) — i 2 k k , i 


^ E - ie . a . . . . . 

- n 2 k j (i) ,1 


■< E - E/ (4m) . 
— n 


(i-J(n)-l) 


Now, since E > E we have E -E = IE -E ; therefore given e>0 
n n ' n 


there exists N>0 such that E - E <C e whenever n>N . We have 

n 


E T/ . - E ^ E - El (4m) -E 
J(n) n 

e - E / (4m) . 

Now choosing £< E/ (4m) gives <: E; which again contradicts 

Proposition 3. This proves the proposition. 

Definition 6. Let A ={ (x-._, , y^):i = l,...,m}for n=0 , 1 , 2 , . . . . 

n i i 

We say that the sequence of sets ] converges to the set 
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a* = ( ( x i >y ^*) : i-i. •••.“) if yj- yf for 1 = 1 * •••,«». 

Proposition 6 . The total energy is a continuous functional, 

i.e. if A -* A* , then E ( A )-E(A*).. 
n n 

. n n. 

Proof. If A -*A* , then the sequence of vectors y =(y.,...y ) 
; n n i m 

converges to the vector y *= (y * , . . . , y *) pointwise; hence in 


anv norm. Let a denote the point energy at the j-th point 

J » n 

of A , with a similar definition for a*. A simple construction 
n J 


shduld convince the reader that 


! a j, n ‘ Cy jl ' 2 ll y n' y *"»‘ 

It follows that a. -*a* and therefore E ( A )-»E(A*). This 

J > n J n 

proves the proposition. 

Proposition 7 . The linear smoothing algorithm converges, 
i.e., the sequence of sets { A n } converges to a set A* with 

total energy zero. 

Proof. We use the same notation as in the proof of Propo- 
sition 6. Clearly !! y „ll. ^ II y 0 II . f for n = l,2,3 Hence 

{y } must have a subsequence which is convergent, say to y* . 
k 

If A* is the set corresponding to y*, then by Proposition 5 and 

Proposition 6 E(A*) = 0. If the entire sequence does not 

converge to y*, then each neighborhood of y* excludes infinitely 

many members of { y } . These excluded members must have a 

n t 


convergent subsequence. If y** denotes this limit, then 



E (y*) = E(y**) = 0; hence y* = y**; but this is a contra- 
diction. This proves the proposition. 

Remark . We have spent considerable time and effort proving 
that the linear smoothing algorithm converges to a solution 
which could have been immediately written down. Of course the 
complete philosophy of this approach is that we only allow a 
few iterations. Indeed, as our examples will show, this 
philosophy is quite natural and analogous to what would be 
done by an artist or a loftsman by hand. Namely, the algo- 
rithm converges very quickly to an acceptable solution and 
from then on the convergence is extremely slow. Our main 
reason for proving convergence was to demonstrate that 
the algorithm will not oscillate. 
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4. Examples . Consider the twenty points A= { (l,y(l) ) , . . . , (20,y(20))} 
taken from the graph of the cubic 

v(x) = (x-7)(x-10)(x-13) 

As it stands the set A is .0297-smooth. In our first example we 
will introduce an error of .5 in the 11-th data point. We introduce 
errors of -.5 in the 9-th point and .5 in the 11-th point for our 
second example. Finally, the third example will consist of the points 
of A with an error of -.5 in the 10-th point and .5 in the 11-th 
point. Based on our theory we should expect the first two examples 
to behave better than the third. Indeed, we obtain a curve as smooth 
as the original curve in just one iteration for the first example and 
in just two iterations for the second example; however the third 
example requires five iterations. All the following calculations 
were performed using a value of one for 9 ,. 

K. 

Observe that all three examples show that the energy is monotone 
decreasing and that the smoothness is not monotone decreasing. How- 
ever these examples imply that a reasonable stopping criterion (since 
we do not want to end up with a straight line) is to stop at the first 
iteration where the smoothness increases. In our examples this is the 
iteration at which the original smoothness is restored. 

The following tables and graphs are reasonably self-explanatory; 
however we point out that the values for the energy and smoothness 
were calculated at the beginning of each iteration and not at the 


end. 



EXAMPLE 1 


- 0 . 7 1 2 0 9 
- 0.48352 
- 0. 3 0 76 9 
- 0 . 1 7802 
- 0. 08701 
- 0. 0 30 77 


X 


1 

-0. 00000 

2 

0. 05000 

3 

0. 10000 

4 

0.15000 

5 

0.20000 

6 

0.25000 

7 

0.30000 

8 

0.35000 

9 

0.40000 

I 0 

0.45000 

1 1 

0. 50000 

1 2 

0.55000 

1 3 

0.60000 

14 

0.65000 

15 

0 . 70000 

16 

0. 75000 

1 7 

o. aoooo 

18 

0.85000 

19 

0.90000 

20 

0.95000 


ITER 

ENERGY 

1 

1 . 2 60 4 4 

2 

0. 26703 

3 

0 . 25220 

4 

0 . 25220 

5 

0. 25220 

6 

0. 25220 

7 

0. 25220 

8 

0.25220 

9 

0 .2390 1 

1 0 

0. 2390 1 


y 

- 0.71209 
- 0 . 48352 
- 0.30760 
- 0 . 1 7802 
- 0.08701 
- 0.03077 
0.0 

0 . 01 009 
0.00079 
0.0 

- 0.00079 
- 0 . 0 1 099 
0.0 

0 . 0 3 0 7 7 
0.0870 1 
0 . 1 7802 
0.30769 
0.48352 
0.71209 
1 .00000 


SMOOTHNESS 
0.49670 
0 . 02967 
0.04121 
0 . 04 168 
0.04162 
0 . 0 372 9 
0.03183 
0 . 026 3 7 
0. 03626 
0 . 0379 l 


0 . 0 

0.01 099 
0 .00879 
0 . 0 

0.49121 error 

0 1 099 

o.o 

0.03077 
0 .08791 
0 . 1 7602 
0.30769 
0.48352 
0 . 71209 
1 .00000 


POINT MOVED 

1 1 
t 9 
1 8 

1 7 
1 6 
1 5 
1 4 

2 

3 

4 


-0 






2 


3 


8 



!0 J1 12 13 IS 15 16 17 18 19 20 






EXAMPLE 2 
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X 

y 

y 

1 

- 0. 00000 

- 0. 71209 

- 0.71 209 

2 

0. 05000 

' - 0 . 48352 

- 0.48352 

3 

0.1 0000 

- 0.3076 > 

- 0.30769 

4 

0 . 1 5000 

- 0.1 7 HO 2 

- 0 . 17802 

5 

0.20000 

- 0 . 0879 l 

- 0.08791 

6 

0.25000 

- 0.03077 

- 0.03077 

7 

0.30000 

0 . 0 

0.0 

8 

0. 35000 

0 . 010)9 

0.01 099 

9 

0.40000 

0.0087 9 

- 0.49121 

1 0 

0. 45000 

O 

• 

c 

O . 0 

1 1 

0.50000 

- 0 . 00879 

0.49121 

12 

0. 55000 

- 0 . 0 l 099 

- 0.01 099 

13 

0.60000 

0 . 0 

0 . 0 

14 

0.65000 

0.03077 

0 . 03077 

15 

0. 70000 

0.08791 

0 . 08791 

16 

0. 75000 

0 . 1 7802 

0 . 1 7802 

1 7 

0.80000 

0 . 30769 

0 . 30769 

18 

0.85000 

0.48352 

0 . 48352 

19 

0.90000 

0.71209 

0.71209 

20 

0.95000 

1.00000 

1 .00000 


error 

error 


ITEE 

ENERGY 

SMOOTHNESS 

POULT MOVED 

1 

1 .75384 

0.49670 

9 

2 

1 .25714 

0.49670 

1 1 

3 

0. 26374 

0.02967 

1 9 

4 

0.24890 

0.04121 

1 8 

5 

0.24890 

0 . 04 368 

1 7 

6 

0 . 24890 

0.04162 

1 6 

7 

0.24890 

0 . 0 3729 

15 

8 

0 .24890 

0.03133 

14 

9 

0. 24890 

0.02637 

2 

10 

0 .23571 

0.03626 

3 



0.600-1 LINEAR SMOOTHING 


SMOOTHNESS= 0.4967 


1 ITERATIONS 


-1.000-1 I 1 i —i 1 f“ 

1 2 3 H 5 6 T 













EXAMPLE 3 
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X 

y 

y 


1 

-0.00000 

- 0 . 7 1 2 0 9 

-0.71209 


2 

0. 05000 

-0 . 48352 

-0. 48352 


3 

0.10000 

—0.30769 

-0.30769 


4 

0. 1 5000 

-0.1 7802 

-0.1 7802 


5 

0.20000 

-0. 08791 

-0.08791 


6 

0.25000 

— 0.030 7 7 

-0.03077 


7 

0 . 3 0 0 0 0 

0.0 

0 . 0 


a 

0.35000 

0.01 099 

0.01 099 


9 

0.40000 

0.0087 9 

0 . 00379 


i 0 

0.45000 

0 . 0 

-0. 50000 

error 

l i 

0 . 50000 

-0.00879 

0.49121 

error 

12 

0.55000 

-0.01099 

-0.01 099 


13 

0.60000 

0.0 

0 . 0 


1 4 

0.65000 

0.03077 

0.03077 


15 

0. 70000 

0 . 0874 1 

0.03791 


16 

0. 75000 

0. 1 7802 

0 . 1 7802 


17 

0.80000 

0 .30769 

0 . 30769 


ia 

0.85000 

0 . 4 8 352. 

0. 48352 


19 

o.yoooo 

0.71209 

0 . 7 1209 


20 

0.95000 

1 . 00000 

1 . 00000 



ITER 

ENERGY 

SMOOTHNESS 

POINT MOVED 

1 

2.26044 

0. 75000 

1 0 

2 

l .00385 

0.37170 

1 1 

3 

0.63214 

0 . 1 8585 

10 

4 

0.44629 

0 . 09293 

1 1 

5 

0.35337 

0 . 04646 

1 0 

6 

0.30690 

0.02967 

1 9 

7 

0.29207 

0.04121 

i a 

8 

0.29207 

0 . 04368 

1 7 

9 

0 . 2 9 2 0 7 

0.04162 

1 6 

1 0 

0. 29207 

0. 03729 

1 5 
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